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Abstract 

This study investigates two-stage plans based on nonparametric procedures for estimating an 
inverse regression function at a given point. Specifically, isotonic regression is used at stage one 

to obtain an initial estimate followed by another round of isotonic regression in the vicinity of this 

> 

f^ estimate at stage two. It is shown that such two stage plans accelerate the convergence rate of 

cn 

^2 one-stage procedures and are superior to existing two-stage procedures that use local parametric 

'^ approximations at stage two when the available budget is moderate and/or the regression function 

,__^ is 'ill-behaved'. Both Wald and Likelihood Ratio type confidence intervals for the threshold value 

. !_( of interest are investigated and the latter are recommended in applications due to their simplicity 

X 

5-j and robustness. The developed plans are illustrated through a comprehensive simulation study and 

an application to car fuel efficiency data. 

1. INTRODUCTION 
Threshold estimation is a canonical statistical estimation problem with numerous applications in 
science and engineering. Here is an interesting motivating example. In recent years, an important 
consideration for both car manufacturers and potential buyers in the United States is the fuel 
efficiency of the vehicle, expressed in miles per gallon (MPG). The National Highway Traffic Safety 
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Administration (NHTSA) regulates the Corporate Average Fuel Economy (CAFE) standards to 
encourage automobile manufacturers to improve the average fuel efficiency of their fleets of vehicles. 
The CAFE standard for 2012 models is 29.8 MFC, set to increase to 34.3 MFC in 2016 according 
to the Environmental Frotection Agency rule that came into effect in August 2012. To encourage 
higher fuel efficiency, manufacturers are subject to a penalty if the average FE of their fleets falls 
below the CAFE standard. Moreover, a so-called gas guzzler tax is imposed on cars with low FE 
in accordance with the US Energy Tax Act of 1978. 

The data on fuel efficiency as a function of the vehicle's horse power, which is a key component, 
for the 2012 models is shown in Figure 1 (for a detailed discussion of the data and CAFE standards, 
see section 4). An expected decreasing relationship is observed and it is of interest to identify the 
horse power threshold at which the fuel efficiency meets the current CAFE, as well as the 2016 
standard. 
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Figure 1: Scatterplots of the relationship between horse power and fuel efficiency of naturally 
aspirated vehicles; FE = m{HP). The right panel shows the data with jittered horsepower to 
create a unique horsepower for every observation. 



The data plot indicates that fitting a precise parametric model may be challenging, while it 



is rather straightforward to fit a monotonicaUy decreasing nonparametric one and obtain the fuel 
efficiency threshold for the target values of ~ 30 and ~ 34. However, it is also desirable to assign 
a confidence interval around the estimate and an interesting question addressed in this paper is 
whether an adaptive procedure can lead to improved precision for such threshold estimates. 

The topic of using adaptive procedures in a design setting for threshold estimation models has 
been recently studied in the literature (Lan et al., 2009; Tang et al., 2011). The basic model 
considered is y = m{X) + e, where the design point X takes values in [a, b], the regression function 
m is monotone, for the sake of presentation henceforth assumed non-decreasing, and the random 
error e has mean and finite variance o"^. The quantity to be estimated is a threshold do, which 
in (Lan et al., 2009) corresponded to a change-point (i.e. m{X) = aol{x < do) + I3q1{x > do) with 
unknown constants uq and /3o), whereas in Tang et al. (2011) to do = 7n~^{6o) for some prespecified 

The employed two-stage adaptive procedure in Lan et al. (2009) and Tang et al. (2011) works as 
follows: (i) in the first stage, it utilizes a portion p of the design budget to obtain an initial estimate 
of do, (ii) in the second stage, the remaining portion (1 — p) of the budget is used to obtain more 
sample points in a small neighborhood of that estimate; (iii) finally, an improved estimate based on 
the second-stage data is constructed. This more intense "zoom-in" sampling leads to accelerated 
convergence rates of the second stage estimators for do compared to the standard ones that use all 
the data in one shot. Specifically, for the change point problem, the rate can be accelerated from 
n to almost n^ (up to a logarithmic factor) as in (Lan et al., 2009), while for the inverse regression 
problem, from n^'^ to n^'^ by employing a local linear approximation (Tang et al., 2011), where 
n denotes the total budget available. Hence, tighter confidence intervals can be constructed that 
have the correct nominal coverage with the same budget as standard one-stage procedures, or 
alternatively one can reduce the design budget and still have good quality confidence intervals. 

In this paper, given our motivating data application, we focus on the second problem, namely 
that of estimating the inverse regression function at a prespecified point Oo- This is closely related to 
dose response (Rosenberger and Haines, 2002) and statistical calibration studies (Osborne, 1991) 



(for additional references see Tang et al. (2011)). As mentioned above, a strategy that obtains 
a first stage estimate using isotonic regression, followed by a local linear approximation, gives a 
consistent second stage estimator that achieves the parametric rate \fn. However, the success of 
this strategy heavily hinges upon the (approximate) linearity of the regression function m{-) in the 
vicinity of d^. Small departures from linearity do not adversely affect the results (especially when 
the budget is large enough to allow for significant "zooming- in" at the second stage), but severe 
departures are a totally different matter as illustrated next. 

Consider a monotone regression function exhibiting strong nonlinearity at dg = 0.5, for example, 
given by m{x) = (1/40) sin(67ra;) + l/4 + (l/2)a; + (l/4)x^ with x G [0, 1] (see the left panel of Figure 
4 for its plot). The coverage rates and average lengths of the confidence intervals obtained from 
the two-stage adaptive strategy based on isotonic regression and a local linear approximation for 
selected total budget sizes (n = 100,300,500), varying portions p allocated to the first stage and 
different noise levels {a = 0.1,0.3,0.5) are depicted in Figure 2. 
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Figure 2: The left panel shows the coverage rates of the 95% confidence intervals using a local 
linear approximation for do = 0.5 with different sample sizes, and noise levels. The right panel 
shows the corresponding average lengths of the intervals. 



It can be seen that for the majority of portions p, the confidence intervals constructed from 
this adaptive strategy fail miserably in terms of coverage rates and exhibit relatively large lengths. 
Of interest is the fact that for large p, the coverage rates indeed approach the nominal level. In 



practice, however, it is not possible to choose an appropriate p without prior information on m. 
Further, even for large p's, the confidence intervals are excessively wide, especially for large noise 
levels. 

In contrast, a two stage adaptive strategy based on employing isotonic regression at both stages, 
which will be fully developed in this paper, overcomes these difficulties. Such a strategy would be 
also desirable for the motivating data application, due to high variability in the vicinity of the 
CAFE thresholds, as seen in the scatterplots of Figure 1. In Figure 3, the coverage rates and 
average lengths using our new strategy are shown for the same settings as above, but with p = 1/4 
(for more on this universal choice of p see Section 2). It can be seen that this wholly nonparametric 
strategy overcomes the previous difficulties, proves robust to the level of local nonlinearity of the 
regression function m and, as argued in Section 2, is easy to implement. 
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Figure 3: The left panel shows the coverage rates of the 95% confidence intervals using a two 
stage procedure for do = 0.5 with different sample sizes, and noise levels. The right panel shows 
the corresponding average lengths of the intervals. 



The remainder of the paper is organized as follows: in Section 2 the adaptive procedure is 
introduced and the main results presented. Section 3 presents extensive simulations results, while 
an interesting application of the methodology to fuel efficiency data is shown in Section 4. Section 
5 concludes. Proofs are sketched in the appendix. 



2. TWO STAGE ADAPTIVE PROCEDURES 

2.1 An Overview of the Isotonic Regression Procedure 

We provide a brief description of the one-stage isotonic regression procedure (OSIRP). Specifically, 
given n fixed or random design points {Xj}"^^ in [a, b] distributed according to a continuous design 
density g and the corresponding responses {l^j}"=i, obtained from the proposed model, the isotonic 
regression estimate of m(-) is given by 

n-l 

mi{x) = m*^l{x e [a, Xi]} + ^ m*l{x G [Xi, X^+i)} + m^lja; G [X„, b]} (2.1) 

i=l 

where {m*}^^^ = Argmin XlILi (^ — ^^ij)^. This minimizer exists uniquely, has a nice geomet- 

mi<m2<...<Tnn 

ric characterization as the slope of the greatest convex minorant of a stochastic process and is readily 
computable using the pool adjacent violators algorithm (PAVA) (see, for example, Robertson et. 
al. (1988)). Then, for a prespecified value ^o ^ i'fnicL) , iTT-ib)) , the one-stage isotonic regression 
estimator of do is defined by 

dj = mJ^{eo) = inf{x G [a, b] : m/(x) > Oq}, (2.2) 

where inf{0} = b. Under mild conditions on the regression function and the design density, namely 
Assumption A: m is once continuously differentiable in a neighborhood of do with positive deriva- 
tive m'{do) and g is positive and continuous at do, 
the asymptotic distribution of d[ is given by (see Tang et al. (2011)): 

n'/^idi - do) A CdMdoT^'^Z, (2.3) 

where C^j = (Ao"^ / m' {do)'^\ and Z follows the standard Chernoff distribution (Groeneboom and 
Wellner (2001)). This result can be used to construct a 1 — a Wald-type confidence interval for do: 

dj ± n-i/3 CZ gidof^'"" q{Z, 1 - a/2) 
where the hats denote consistent estimates and q{^, r) is the lower r'th quantile of a random variable 
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An alternative is to construct confidence intervals through likelihood ratio (LR) testing. Specif- 
ically, the hypotheses of interest are 

Hq : m-\e^) =do^Ha: m-^Oo) / do- (2.4) 

Then, the LR test statistic is given by 

2 log A/ = 2 log Xi{do) = 2 [/„(m/, a) - ln{mic, a)] , (2.5) 

where ln{rn,a) = — (2o"^)~-'^ ^^^^(1^ — m{Xi))'^, mjc is the constrained isotonic regression of m 
under Hq and a a consistent estimate of a. It is known that mjc uniquely exists (see Banerjee 
(2000)). The asymptotic distribution of 2 log A/ under Hq is given in (Banerjee, 2009): 2 log A/ — t- D, 
where D is a 'universal' random variable not depending on the parameters of the model (Banerjee 
and Wellner (2001)). This result allows us to construct a 1 — a LR-type confidence region for do- 

{x e [a, b] : 2 log Xi{x) < q{D, 1-a)}. (2.6) 

The LR-type confidence region can be shown to be an interval and is typically asymmetric around 
dj, unlike the Wald-type one. Its main advantage is that only a needs to be estimated for its 
construction, whereas for the Wald confidence interval, estimation of m'[dQ) is also needed, a 
significantly more involved task. 

Rem,ark 2.1. The use of the term LR statistic in connection with 2.5 needs to be clarified. Under a 
normality assumption on the errors, 2 log A/ is, indeed, a proper likelihood ratio statistic; otherwise, 
it is more accurately a residual sum of squares statistic which can be interpreted as a 'working 
likelihood ratio statistic' where the normal likelihood is used as a working likelihood. In this paper, 
we do not assume normality of errors but continue to use the term LR statistic for 2 log A/ in the 
above sense. 

2.2 Adaptive Two-Stage Procedures 

As noted in Introduction, adaptive two stage procedures can lead to accelerated convergence rates 
and hence to sharper confidence intervals for do • The main steps of such a two-stage fully nonpara- 
metric procedure are outlined next: 
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1. Denote by p € (0, 1) the sample proportion to be allocated in the first stage and by ni = \_np\ 
and n2 = n — ni, the corresponding first and second stage sample sizes, respectively. 

2. Generate the first stage data {{Xi^i, l"i,j)}"l]^ with a design density gi on [a, b]. Then, compute 
a first stage monotone non-parametric estimator rhi of m and obtain the corresponding first 
stage estimator dij = rri^ {6q) of d^ for a prespecified value 6*0. 

3. Specify the second stage sampling interval [Li, Ui] = [di^i ±Cin^'^'^\ n [a, h\ where Ci > and 
< 7i < 7* < 1/2, 7* being the convergence rate of di. 

4. Obtain the second stage data {(^2,1, ^2,i) 1^=1 '^ith a design density 52 on [Li,[/i]. Employ 
these data and a non-parametric procedure (which could be different from the one used 
previously) to compute a monotone second stage estimator r?T,2,/ and, as in the first stage, the 
corresponding ^2,/. 

5. Construct confidence intervals for do using the asymptotic distribution of d2j- 

Remark 2.2. Choosing 71 < 7* ensures that the stage two sampling interval contains do with 
probability going to 1. 

2.3 Asymptotic Properties of Two-Stage Estimators 

We discuss the properties of the two-stage procedure, where isotonic regression is employed in both 

stages (henceforth, IR+IR). 

Proposition 1. Consider the IR + IR procedure. Let the design density at stage two he given by: 
g2{x) = (Ci n^'^^)~'^ ijj{{x — dij)/Ci n^'^^) where ip is a Lebesgue density on [—1, 1] that is positive at 
and continuous in a neighborhood of 0. Thus, g2 is simply ip renormalized to the sampling interval 
at stage two. Assume that m! , the derivative of m, exists and is continuous in a neighborhood of 
do and m (do) > 0. Let d2j = m^j(^o) where m2j is the isotonic estimator of m constructed from 
the second stage data. Then, n^^+'^^^/^{d2,i - do) -> Cd^^Z, where Ca^^ = Cdj f (i_p)p7i^(o) ) 



From Proposition 1, a Wald-type 1 — a asymptotic confidence interval for do is given by 

[d2,i ± n-(i+^^)/3cQ;g(Z, 1 - a/2)] . (2.7) 

Remark 2.3. A consequence of the accelerated rate of convergence obtained with the IR+IR strategy 
is that the asymptotic relative efficiency (ARE) of the two-stage estimator ^2,/ with respect to the 
one-stage estimator dj is 

A^P(^ ^^ ^•^•(^/) /(1^P)pZ^M^'^' 71/3 _ 

ARE{d2j,di) = — — — — r= — — — — n'l''^ -> oo as n -> oo. 

s.d.{d2,i) V Cig{do) J 

Note that, in the generic description of the two-stage procedure above, we use a confidence 
interval for do that relies on the asymptotic distribution of a point estimate computed at stage two. 
However, this is not the only way to proceed at stage two. Having collected the second stage data 
at the beginning of Step 4, we can bypass point estimation altogether and construct a confidence 
interval using likelihood ratio inversion. This alternative possibility is discussed below. Also, as will 
be explained in the practical implementation, the construction of [Li, Ui] is achieved in practice by 
constructing a high probability confidence interval for do from the stage one data. This also opens 
up the possibility of bypassing point estiinates at stage one in favor of a likelihood ratio inversion 
based confidence interval, a point that we come to later. 

An alternative LR-type CI can be constructed as follows: the LR-type test statistic at stage two 
for testing Hq : do = m^^{6o) is 

2 log A2,/ = 2 log A2,/(do) = 2 [/„(m2,/, a) - l„{m2,ic, ^)] , (2.8) 

where Z„(m,, o") = — 2^ ^^=1(^2,1 — "^(-^2,«))^; "^2,4 i^ the constrained estimator of m under the 
null hypothesis Ho and o" is a consistent estimate of a. 

Proposition 2. Under the assumptions of Proposition 1, and the null hypothesis Ho: m^^iOo) = do 
holding true, we have 21og A2 / — )• D, where D is as before. 



Finally, from Proposition 2, an LR-type (1 — a) asymptotic confidence interval for do is given 

by 

{x G [a, b] : 2 log X2j{x) < q{0, 1 - a)}. (2.9) 

For the theoretical derivations in connection with Propositions 1 and 2, see the Appendix. 

Remark 2.4. We have focused on the case of two-stage adaptive designs and the acceleration of 
the convergence rate by an IR+IR strategy. Obviously, one can extend it to multiple stages and 
continue using isotonic regression. As outlined in Section SI in the Supplement, it can be established 
that the convergence rate of such a procedure would come arbitrarily close to the ^/n parametric 
rate, if enough stages are employed, but would not achieve it. 

2.4 Implementation Issues 

We discuss, next, the main steps for implementing the IR+IR strategy in practice. Specifically, we 
address the following: (i) estimation of o"^, (ii) estimation of m' , (iii) determination of second stage 
sampling interval [Li, Ui], (iv) the first stage sampling proportion p. 

Implementation of IR + IR: For the estimation of cr^ at Stage 1, we employ the nonparametric 
estimator proposed by Gasser et al. (1986), and for the estimation of m'{do), the local quadratic 
regression procedure proposed by Fan and Gijbels (1996); some details are provided in Section 3. 
Next comes the determination of the second stage sampling interval. Recall, that the theoretical 
formula for the interval is given by [dij it Cin^'^^], with Ci > and 71 G (0,1/3). While any 
such interval will contain do with probability going to 1 in the long run, in practice we would 
like to ensure that our prescribed sampling interval [Li,{7i] does trap do with high probability. 
The practical determination of [Li, Ui] is therefore achieved through a high probability confidence 
interval for do from Stage 1 data. Consider the the following 1 — /3 Wald-type confidence interval 

[dij ± n^'^''Cd,gi{dijy'/^q{Z, 1 - (3/2)] n [a, b], (2.10) 

where the computation of Cdj involves estimating both a'^ and m'{do) and where /3 is a small positive 
number such as 0.01. Using this, in practice, as [Li, Ui] amounts to choosing Ci and 71 such that 
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Cin-^' = n-'/='(7d,gi(di,7)-i/3g(Z, 1-/3/2). That is, 71 = 1/3 and Ci = CdMdi,ir^^^q{2,l- 
(3/2). Although 71 = 1/3 is not in (0,1/3) as required by our theoretical results, it nevertheless 
provides a good approximation in practice, since it lies at the boundary of that interval. 

As far as the first stage sampling proportion is concerned, one would like to choose this in such a 
way as to increase the precision of the second stage isotonic estimator. Proposition 1 shows that the 
second stage estimator is asymptotically unbiased and that its standard deviation is proportional 
to {(1 — p)p'^^}^^'^. For a fixed 71, this is minimized when log(l — p) + 71 logp is maximized, 
which happens when p = popt = 7i/(l + 7i)- As 71 approaches 1/3, popt approaches 1/4. Thus, the 
optimal practical allocation of budget at Stage 1 is 25%. 

From Stage 2 data, we can construct a confidence interval for do based on ^2,7 following Propo- 
sition 1 in which case both m'(do) and a'^ need to be updated. Alternatively, we can use likelihood 
ratio inversion to get a CI of desired coverage for do, following Proposition 2 in which case only 
the estimate of a^ needs to be updated. Finally, a third option is to bypass the estimation of 
'm'(do) altogether by prescribing a high probability LR based confidence interval for do as [Li, Ui] 
in Stage 1 and then using LR inversion at Stage 2 as well. While this procedure does not quite 
fall within the purview of our theoretical results it is a natural methodological choice; furthermore, 
comparisons among these three approaches based on elaborate simulation studies demonstrate that 
it is superior to the other two in practice. 

3. PERFORMANCE EVALUATION OF THE ADAPTIVE PROCEDURES 
In this study, the following procedures are compared: (i) practical one-stage procedure based on 
isotonic regression (POSIRP) with Wald and LR CIs, (ii) practical two-stage procedure based on 
isotonic regression (PTSIRP-Wald) for both stages and using Wald CIs both for selecting (Li, Ui) 
and constructing the final CI, (iii) practical two-stage procedure (PTSIRP-LR) similar to (ii) but 
employing LR CIs in both stages and (iv) the procedure from Tang et al. (2011) that uses isotonic 
regression followed by a local linear approximation and bootstrapping for constructing CIs for do 
(PABLTSP). The use of the qualifier 'Practical' before the various procedures above is to emphasize 
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Figure 4: The left plot shows the regression functions: sigmoid, quadratic and isotonic sine func- 
tions. The right plot shows their derivatives. 

the point that they involve estimates of nuisance parameters, as explained below. 

The simulation settings are as follows: the design space is the [0, 1] interval and the regression 
functions considered: (i) the sigmoid function m{x) = exp(4(x — 0.5))/[l + exp(4(x — 0.5))], the 
quadratic function 7n[x) = x^ and the isotonic sine function m{x) = (1/40) sin(67r2;) + 1/4 + 
(l/2)x + (1/4)2;^. The target point d^ is 0.4, 0.5 or 0.6, while the random error follows a NiQ^a"^) 
distribution, with a taking values 0.1 and 0.3. The total sample size n ranges from 100 to 500 
in increments of 100. All design densities g^ gi and 52 are uniform, while the confidence level for 
all CIs is set to 0.95. The results presented are based on 1000 replicates. For PABLTSP, we set 
the first stage sample proportions p = 0.7 in order obtain accurate coverage rates for all functions. 
(Yet, as shown in Figure 2, in some cases good coverage rates are achieved at the cost of large 
average lengths.) For all other two stage procedures, we set p to be the asymptotically optimal 
proportion of 0.25. The quantiles of D and Z for constructing the second-stage sampling intervals 
for PTSIRP are set to be 4 and 2, respectively, corresponding to /? = 0.01. 

When estimating a and m'{do) in the second stage, only Stage 2 points are used in order to 
stay strictly within the scope of the methods used for these purposes. With smaller budgets as in 
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Figure 5: The left and right panels show the coverage rates and average lengths of the 95% confi- 
dence intervals for do from the practical procedures with the sigmoid and quadratic functions and 
different values of o", do and n. 

the real data example, we follow the natural practice of combining both stage samples for updating 
estimates of a and m'^do), which makes second stage results more reliable. 

To gain insight into the simulation results, we depict the plots of the functions under consider- 
ation together with their derivatives (see Figure 4). 

The coverage rates and average lengths of the 95% confidence intervals for do are shown in 
Figures 5 and 6. 

It can be seen that for the quadratic and sigmoid functions, the proposed two-stage procedures 
perform well with the coverage being about the nominal level 95% for all do's, sample sizes and 
noise levels considered. Further, their average lengths are fairly comparable. In contrast, PABLTSP 
shows inferior performance for larger noise and smaller sample sizes for the quadratic function. 

The isotonic sine function proves the most challenging. The left panel of Figure 6 shows the 
coverage rates of the practical procedures. As discussed in the introduction and seen in the figure, 
this function exhibits strong nonlinearity causing the the local linear approximation PABLTSP to 
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feature very poor coverage rates. Also, note that, for the case with do = 0.5 the coverage rates of 
the confidence intervals from POSIRP-Wald and PTSIRP-Wald are consistently lower than 95%. 
This behavior is caused by inaccurate estimation of m'{do) as illustrated in the Supplementary 
material (Section S2, Figure 1). The true value of m'((io) is around 0.279 and the corresponding 
kernel estimators of m'(do) are usually around 0.75, significantly larger than the true value. This 
makes the confidence interval far too short to cover do and consequently, the coverage rates behave 
erratically. 

Full details for the estimation of m'^do), which utilizes a local quadratic regression procedure, 
are available in Section 4 of Tang et al. (2011). An asymptotically optimal bandwidth, given in 
equation (3.20) on page 67 of Fan and Gijbels (1996), is employed for this purpose. This local 
bandwidth minimizes the asymptotic MSE, and indeed with large sample sizes, we find that m'((io) 
is estimated accurately and the coverage rates approach the nominal level. Further emphasizing the 
importance of the derivative estimate and as illustrated in Figure 2 of the Supplementary material, 
if we repeat the procedures with perfect knowledge of m'{do), then coverage rates are about the 
nominal level of 95% for the sample sizes considered. 

Fortunately, for this wiggly isotonic sine function, POSIRP-LR and PTSIRP-LR have good 
coverage rates for all simulation cases. This indicates that LR-type confidence intervals are usually 
robust with different regression functions. The average lengths of the confidence intervals are shown 
in the right panel of Figure 6. Unsurprisingly, PTSIRP-LR achieves shorter average lengths since 
it is a two-stage procedure. 

In summary, we find that when the underlying regression function is well-behaved, the more ag- 
gressive PABLTSP performs well. However, the conservative but stable PTSIRP-LR offers a robust 
procedure that performs well, even when the underlying function exhibits strong nonlinearities. 

4. AN APPLICATION TO FUEL EFFICIENCY STANDARDS 

As discussed in Introduction, car fuel efficiency (FE) is an important issue for both manufacturers 
and consumers, due to new CAFE standards. Note that while the CAFE standards are regulated 
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Figure 6: The left panel shows the coverage rates of the 95% confidence intervals for do from the 
practical procedures with the isotonic sine functions and different values of a, do and n. The right 
panel shows the average lengths of the 95% confidence intervals for do. 

by the NHTSA, the vehicle FE is assessed by the Environmental Protection Agency (EPA). From 
2008 onwards, the EPA measures the fuel efficiency of a vehicle in two testing modes: city and 
highway, taking into consideration different speeds and acceleration, as well as air conditioning 
usage and colder outside temperatures, in an effort to better approximate real- world fuel efficiency. 
From the unadjusted city and highway fuel efficiency, the unadjusted combined fuel efficiency is 
calculated as follows (see www . epa . gov) : 

Combined FE = h .15. 

.495/City FE + .351/Highway FE 

The data for this study were extracted from the government website www.fueleconomy.gov 
that includes all FE data for all 2012 car models available to US consumers. This data set contains 
the unadjusted city, highway and combined fuel efficiency for 3979 models, together with their horse 
power. We collected the horse power data for 1477 non-hybrid vehicles with automatic transmission 
gearboxes and natural aspiration engines (i.e. excluding turbo engines and plug-in hybrid vehicles). 
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Table 1 


Data Ana 


ysis Results for the Five Practical Procedures 




Procedure 


Estimator 


"Bias" 


95% CI 


Coverage 


Length 


n 


POSIRP-Wald 


165.022 


21.978 


[151.595, 178.450] 


No 


26.855 


80 


POSIRP-LR 


165.022 


21.978 


[135.887,301.439] 


Yes 


165.552 


80 


PTSIRP-Wald 


213.221 


26.221 


[205.042,221.400] 


No 


16.358 


40,40 


PTSIRP-LR 


194.557 


7.557 


[148.812,225.955] 


Yes 


77.143 


40,40 


PABLTSP 


169.509 


17.491 


[145.982, 172.964] 


No 


26.982 


40,40 



in order to have a relatively homogeneous data set. 

The objective of our analysis is to estimate the following model FE = 771(11 P) (or HP = 
771^^ (FE) and then identify the horse power at which the combined FE is equal to 30 MPG, 
around the 2011 CAFE standard. Hence, we are interested in estimating do = rn-^^(30). 

The scatter plot in the left panel of Figure 1 shows the combined FE of these 1477 vehicles as a 
function of their horse power and indicates a decreasing relationship. Notice that there are multiple 
vehicle models with the same horse power, but different FE. To simplify the analysis, we add a 
small jitter to the original horse power to obtain a unique horsepower for every FE observation, 
whose scatterplot is given in the right panel of Figure 1. The jitter added is between ±1 to ensure 
that the ordering of samples by horsepower remains unchanged. 

Given that this is an 'observed' data set, we will emulate the design setting (for a similar strategy 
see also Lan et al. (2009)) for a budget of size 80. Both one stage and two stage procedures will 
be examined. For one stage procedures, 80 horse powers equally spaced are originally selected and 
the closest ones in the data constitute the final covariate values, together with the corresponding 
responses. For two stage procedures, we select a portion p = 0.5 in the first stage and hence 
select 40 horse powers in the first stage as previously described. After obtaining the second stage 
sampling interval (Li, Ui) we choose with an analogous strategy the remaining 40 points. (Given the 
relatively modest budget, we chose not to use the asymptotically optimal allocation of 25% + 75%.) 

Finally, the "true" value of do is obtained by using isotonic regression on the entire sample of 
1477 observations and is estimated to be around 187. 
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The five procedures considered are: POSIRP-Wald, POSIRP-LR, PTSIRP-Wald, PTSIRP-LR 
and PABLTSP. The fitted models are shown in Figure 7 and the confidence intervals obtained 
summarized in Table 1. It is interesting to note that only the LR based procedures produce 
confidence intervals that cover the "true value." The two-stage procedure PTSIRP-LR achieves 
much shorter interval lengths. The PTSIRP-Wald CIs are too short, resulting in their missing the 
"true value." 
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Figure 7: The top panels show one stage procedures: POSIRP-Wald and POSIRP-LR. The bottom 
panels show two stage procedures: PTSIRP, PTSIRP-LR and PABLTSP. Numbers denote first and 
second stage samples, vertical lines denote corresponding confidence intervals, and 'X' marks the 
final point estimate. 

Additional results from utilizing different allocations in the two stages are provided in Figure 
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3 of the Supplementary material, where we see that in almost all cases PTSIRP-LR tends to cover 
the "true" value of 187 with better point estimates. POSIRP-Wald and PTSIRP-Wald continue to 
struggle due to estimation difficulties with m'{dQ). 

Next, we perform another experiment to assess the reliability of the procedures with the FE 
data set. We treat the data from the 1477 vehicles as the population, and sample from it accord- 
ing to different overall budgets of size 20, 30, 40, 50, 60, 70, 80, 90, 100. Given the modest budgets, 
we combine samples from both stages for estimation of auxiliary parameters and inversion of the 
likelihood ratio. The results, averaged over 500 repetitions for each budget size, are depicted in 
Figure 8. Note that POSIRP-Wald and PTSIRP-Wald still struggle to maintain coverage rates 
due to difficulties of auxiliary parameter estimation. The local linear approximation in PABLTSP 
also faces difficulties with such small budgets. On the other hand, POSIRP-LR and PTSIRP-LR 
maintain good coverage for all budgets. As noted before, PTSIRP-LR outperforms its one-stage 
counterpart with narrower intervals. Considering the overall budgets investigated in this experi- 
ment, PTSIRP-LR performs well with an extremely small fraction of the overall data, illustrating 
its utility in the context of very large data sets, a topic of further discussion in the Discussion 
section. 

Finally, we return to the task discussed in the introductory section of estimating the horse 
power at which the combined FE is equal to the upcoming 2016 CAFE standard of do = m~^(34). 
Employing isotonic regression on the entire sample yields a "true" value of do of around 155, with 
corresponding 95% confidence interval [143.360,166.052]. Following the same procedure and budget 
allocations as above, PTSIRP-LR yields do = 166.204 with corresponding 95% confidence interval 
[145.599,175.450], as shown in Figure 9. PTSIRP-LR covers the "true" value with a reasonably 
sized interval, while utilizing a small fraction of the overall budget. 

The upshot of the analysis is that the two stage LR based procedure offers superior perfor- 
mance to its competitors even with smaller budgets and when the underlying function exhibits 
nonlinearities. 
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Figure 8: Results obtained after combining both stage samples. The top panel shows the coverage 
rate and average length of confidence intervals generated by the five different procedures. The 
bottom panel shows the distance of the point estimate to the "true" value, and the distance of the 
derivative estimate to its "true" value. 
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Figure 9: PTSIRP-LR results for estimating the 2016 CAFE standard of do = 'm.~^(34). Numbers 
denote first and second stage samples, vertical lines denote corresponding confidence intervals, and 
'X' marks the final point estimate. 

5. DISCUSSION AND CONCLUDING REMARKS 
In this paper, we considered the estimation of the inverse of a monotone regression function at 
a given point in a design setting. The results strongly suggest that a two-stage procedure using 
isotonic regression in both stages coupled with calculation of likelihood-ratio based confidence 
intervals is agnostic to the local structure in the vicinity of the parameter of interest, requires 
minimal tuning and exhibits superior performance. 

The reader may wonder whether an alternative nonparametric procedure at stage one, with a 
faster than the n^' ^ convergence rate of isotonic regression may offer advantages to the proposed 
strategy. We have investigated smoothed isotonic regression (Tang, 2011) which, in a single stage, 
exhibits a convergence rate of n^' ^ and when repeated in the second stage exhibits the same 
acceleration pattern as isotonic regression provided the bandwidth is appropriately chosen (for a 
detailed discussion of this subtle issue see (Tang, 2011)). However, extensive numerical work shows 
that no significant performance gains are realized, compared to using isotonic regression in both 
stages, while at the same time a bandwidth parameter needs to be carefully specified. Indeed, a 
strategy based on isotonic regression in the first stage, followed by smooth isotonic regression in 
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the second stage struggles with the estimation of the iso-sine function presented in Figure 2. 

FinaUy, we should note that although the developed methodology applies to design settings 
(where the investigator can select the desired covariate and the corresponding response variable 
values), it can also prove useful in the context of very large data sets. Suppose that one is interested 
in estimating a threshold of a monotone function from a very large data set that can not be 
stored in its entirety in computer memory. In that case, one-stage estimation based on the entire 
data set is computationally challenging, since it requires appropriate modification of the standard 
algorithms. However, by adopting the proposed adaptive design framework, one can overcome 
such computational difficulties, while still obtaining a high degree of accuracy. It is our belief 
that by going to multiple stages, if necessary, with judiciously chosen parameters, one can match 
the performance of the estimator based on all data that could be stored in computer memory, 
thus providing a computationally efficient procedure that avoids major modifications of existing 
algorithms. The latter claim is supported by the results of the experiment shown in Figure 8, and 
in Figure 4 of the Supplementary material, which indicates the PTSIRP-LR reduces computing 
time by substantial amounts at larger budgets compared to its one stage counterpart. 

APPENDIX A. PROOFS 

We discuss Propositions 1 and 2 of the paper. 

A.l Appendix of TSIRP 

We introduce the following idealized two-stage isotonic regression procedure (ITSIRP) as follows: 

1. Set the first-stage sample proportion p £ (0, 1) and let the first and second-stage sample sizes 
be ni = [np\ and n2 = n — rii, respectively, where n is the total sample size. 

2. Let the ideal second-stage sampling interval be [Lii,Uii] = [do ± Cin^^^] with Ci > and 
71 > 0. 

3. Allocate the second-stage design points {^2,j}"Ji according to a Lebesgue density g2 on 
[Lii, Uii], given by g2{x) = {Cln^'''^)^^^p{{x — do)/{Cin^ '^^))- Denote the corresponding i.i.d. 

21 



second-stage responses {^2,i}ni- 

4. Compute the unconstrained isotonic regression ttt-q/ (and the constrained one nioic under the 
nuU hypothesis m^^(0o) = do) of m over [Lu, Uu] from the second-stage data. 

5. Obtain doi = rn~j {9o) and 21ogAo/ = 21ogAo/(do) = 2 [/„(rn,o/,(T) — Ini'm'oic-,^)]-, the ideal 
second-stage isotonic regression based estimator of do and hkehhood ratio statistic under 
Hq : m~^{6o) = do] here, as before, o" is a consistent estimate of a. 

Remark: Note that ITSIRP is similar to TSIRP except that in ITSIRP the second-stage sampling 
interval is centered at do instead of at dij; therefore, the sampling density at Stage 2 in ITSIRP is "0 
renormalized to [Lu, Uu], just as (72 is ip renormalized to [Li, Ui] (see Proposition 1) in TSIRP. Since 
dij converges to do at rate n^'^, which is faster than the rate at which [Li, Ui] is decreasing around 
dij (since 7 < 1/3), [Li, Ui] is essentially indistinguishable from its idealized counterpart [Lu, Uu] 
and the asymptotic behavior of doi will be identical to that of d2i- Similarly, the asymptotic 
distribution of the idealized LRT, 21ogAo/, will be the same as that of 21ogA2,/. 

A rigorous proof of Proposition 1, formalizing the intuition above, can be provided via con- 
ditioning arguments similar in spirit to those used for proving Theorem 2 in Lan (2007) that 
establishes the distributional convergence of the two-stage estimator of a change-point in a regres- 
sion model; more specifically, the proof of Lemma 3.2 (a key intermediate step in proving Theorem 
2) of a process convergence result proceeds by conditioning on the values of the relevant estimates 
at Stage 1 in conjunction with some uniformity arguments. Proposition 2 requires similar condi- 
tioning strategies. In this paper, we provide a sketch of the derivations of the limiting distributions 
of the 'idealized' (surrgoate) quantities doi and 21og Aq/. We first introduce the following quantities. 

For positive constants a,b we define the process Xa^bit) = aW{t) + bt^ where W{t) is two-sided 
Brownian motion on R, starting from 0. For a function / defined on M, let slogcm (/, /) denote the 
left slope of the greatest convex minorant of the restriction of / to the interval /. Define ga,b{'t) = 
slogcm (X„,5,M)(t) a.ndgl^{t) = {slogcm (X^,^, (-00, 0])(t)AO} l(t < 0) + {slogcm (X„,6, (0, 00]) (t)V 
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0}l(t > 0). Define D = J {igi,i{t)f - {g^^^{t)f} dt and recall that Z = ArgminjXi,i(t) is the 
Chernoff random variable. 

Theorem 3. Under Assumption A, we have 

n^'+^^y^doi - do) A Cd^^Z, 

where Cd^^ = Cd^ \{i-p)p^i'^{o)) 

Theorem 4. Under Assumption A and the null hypothesis Hq : ?n,^^(0o) = do, 21ogAo/ — )• ID). 

Proof-sketch of Theorem 3. For every x € M, 

= p(4'+^^)/=^(mo,(do + xn2-('+^^)/=^)-eo)>o) (Al) 

= P (n^^^^'^^\moi{do + xn~(^+^^)/^) - m{do)) > o) . (A2) 

Thus, it is sufficient to derive the limiting distribution of rig '^^ (mo/(c?o + xn2 '^^ ) — m[do)). 
Deducing this limit involves three main steps: the first uses a switching relationship to change the 
original problem into an M-Estimation problem; the second solves the M-Estimation problem in 
the framework of empirical process theory; the third simplifies the final limit distribution. This ap- 
proach is, by now, standard in dealing with the asymptotics of isotonic estimates; see, for example, 
pages 296-299 of van der Vaart and Wellner (1996). Without loss of generality, we take [a, b] =]0, 1] 
from here on. 

In the first step, we show 

Lemma 5. For t G [do ± C'ln^'''^] and s G R, 

inoiit) < s <^ argmin {y„2(x) — sG„2(x)} > T(t), (A3) 

xe[do±Cinpi] 
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where, 

n2 , n2 



^n2\ 



{x) = —y^Y2,il{X2,i<x), Gn2{x) = —y^l{X2,i<x)- (A4) 

1=1 

r(t) is i/ie largest X2^i less than or equal to t. 



"-2 ^ n2 . -, 

1 = 1 4 = 1 



This equivalence is called the 'switching relationship' and can be derived by arguments similar 
to those leading to the last display on page 298 of van der Vaart and Wellner (1996). Hence, 

P (4'+^^)/^(mo/(do + xn" ('+^1)/^) - m{do)) < z) 
= P (moiido + xn^^'+^^)/^) < mido) + zn^^'+^^^Z^) 

= pi argmin IVn^ix) - {m{dQ) + zrq^^^^^^'^)Gn2{x)\ >T{dQ + xn^^^^^'^^'^) 

yxe[do±Cinpl] 

In the second step, by arguments similar to those on Page 299 of van der Vaart and Wellner (1996), 
we establish: 

Lemma 6. Under Assumption (A), as n ^ oo, 

P I argmin [Vn^ix) - {m{do) + zn-'-^^^'^^^)Gn2ix)] > T{dQ + xn^^^^^'^^^) 

— )• P ( argmin {Xc^d — zh} > x\ , 

where c= (Ci a^ ((1 -p)/p)T/V'(0))^^^ and d = m'{do)/2. 

In the third step, we use another switching relationship, namely: 

gc,d{x) > A <^=^ argmin{Xc^di't) — \t) < x, for A G M, (A-5) 

and the continuity of the random variables involved in the above display, to get: 

P ( argmin {Xc^ - zh} > x] = P {gc,d{x) < z) . 
\ heM / 

Hence, 

n(^+^^)/' {moiido + xn-^'+'''^/^) - m{do)) ^d 9cA^) (A6) 

It follows from (Al) that 

P{n^^^^^^'\doi - do) < x)^P(<7c,d(x) > 0) . (A7) 
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Then, using (A5) again, we have 

P (n^2~^^''^^^idoi -do) <x) -^ P (argminXcdit) < x] . (A8) 

Now, from Problem 5 on Page 308 of van der Vaart and Wellner (1996), we have argmin X(.^(i{h) = 
(c/d)'^''^ argmin Xi^i(t), whence 

P (n^2^^'^^^idoi -do)<x\ ^P ({c/df/^argminXi^i{t) < x\ . (A9) 



Since 



we have 



(,/,)2/3_, 4.= (l-p^.C,^■" 



m'{doY pTiV(O) 



{do, - do) ^ C, [—^T^i;^) 
which leads to n^^^^'^^''^{doi — do) — )• Cd^^Z, the result in Theorem 3. D 

Proof-sketch of Theorem 4- For simplicity, we assume the second-stage sampling density is uniform 
on [Lii, Uii\. That is, 52(2;) = {2Cin^ 7i^-i £qj, ^ ^ [Lu, Uu]. Then, similar to (A6), we have 

^ n^^+^'^/\moicido + xn2 ^^+^^^/') - m(do)) ) \ <,(x) 

where now, c = {2Cia'^[{l — p)/p\'^^)^''^ and d = m'{do)/2. In fact, the weak convergence (AlO) 
holds not only finite dimensionally, but also in the normed linear space L2[—K^ K] x L2[—K, K\ for 
every A' > 0, because of the monotonicity of both ruoi and ruoic- 

To derive the asymptotics for 2 log Aq/ , it suffices (by Slustky's theorem) to consider a tweaked 
version of this quantity with the a^ in the denominator replaced by the true o"^. In what follows. 



"^ ' ""'"' ' ' (AlO) 
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we work with this version and continue to call it 21ogAo/. We have: 
21ogAo/ = 2 



n2 -, n2 



-1 '"Z -| '"Z 

i=l i=l 

-, ( n2 n2 1 

^ \ E [(^2,i - ^o) - {moic{X2,i) - e^)f - Y^ [{Y2,^ - Oq) - (mo,(X2,i) - ^o)]' \ 

n2 n2 

J2{Y2,^ - eo){moIc{X2,^) - Oq) - ^(Ya.^ - 0oKmoI{X2,^ 

1=1 i=l 

-1 '"Z 

+ ^Y1 [i^oIciX2,i) - Oof - {moIiX2,^) - Oof] 
1=1 

2 "2 2 "^ 

= n y~l(^2,i - moIc{X2,i)){moIc{X2,i) -Oo) ^ V] (mo7c(^2,i) - ^o) 

i=l i=l 

2 «2 2 "^ 

+ -2 y.0'2,^ - moi{X2M^oi{X2,i) - ^o) + ^ V {mol{X2,i) " ^o)' 

-| n.2 

+ ^ E {i^oic{X2^i) - Oof - {moi{X2,i) - e^f] 

i=l 
-, n.2 

-2 E [("^oK^2,i) - ^O)' - {nioIc{X2,i) - eof] , 



i=l 



where the last equation is a consequence of the fact that isotonic regression estimators are formed 
by averaging the responses over blocks of order statistics of the covariates, which ensures that 



n.2 "2 



E(^2,j - 'moi(X2^i)){vnoi(X2^i) - Oo) and ^(^2,* - moic{X2,i)){moic{X2,i) - 9o) 

i=l 1=1 

are both equal to 0. 

Now denote P„2 ^^ the empirical measure of the second-stage covariates {X2^i}^li and P„2 ^^ 
the corresponding uniform probability measure of X2,i. Let Dn2 denote the interval on which nioi 
and nioic differ. Then, we have 



21ogAo/ = ^P„2 [{nioiix) - Oof - {moic{x) - d^f] {x G Z)„J = ^i + T2, 



where 



Ti = p(P„2 - Pn2) [{moi{x) - Oof - (moUx) - Oof] {x G Z)„J, 
T2 = -^Pn2 [{moi{x) - Oof - (moicix) - Oof] {x G Dn2}- 
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Since (1 — 27i)/3 < 1/2, both {moi{x) — 9q) and {moic{x) 



1+71 

are Op{n2 * ) and 



1 ^-,^^^1 

Jl = ~2^2 V^"2 ~ -'"2; 



1 + 71 



"-2^ ("io/(2;) -6*0) 



1 + 71 



^2 ^ (mo/c(a^) - fi^o) 



{x G Z^nJ, 



we can show that Ti converges to in probabihty by empirical process theory arguments. 
Next, T2 is given by 



n2 
1 



n 



71 



[(mo/(x) - 6*0)^ - {moic{x) - Oof] wh-dx 



2Cia2 \l-p 



P 



' n^+T^ / [(rn.Kx) - 0o)' - {rrioicix) - ^o)'] ^ix 



Dn, 



-2712^'''^ / [("1-0/(2:) - 6*0)^ - {moic{x) - Oof] dx 



Dn 



'^ -In^ ^ (Dn^-do) 



1+71 



712 ^ (l^olido + *7l2 



1 + 71 



^0) 



1 + 71 



"-2 ^ {ruoicido + t?l2 



1 + 71 



^0) 



di 



d 1 



[gcAtf-9Utf]dt 



The equality preceding the weak convergence above follows from the change of variable x = 
do + tn2 '^^ and the weak convergence of the likelihood ratio statistic follows from the weak 

1 + 71 

convergence result (AlO) in the L2 sense and the fact that the set 77-2 '^ (-Cn2 ~ do) is contained 
in a compact set with (arbitrarily) high probability, eventually. For the very last equality, see, for 
example, the proof of Theorem 2.2 of Banerjee (2007). Thus, Theorem 4 holds. 

D 
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SI. MULTISTAGE ADAPTIVE PROCEDURE AND ITS CONVERGENCE RATE 

Next, we discuss whether the parametric convergence rate of y/n can be achieved by using 



>■ a multistage sampling procedure (with more than 2 stages) that constructs an isotonic re- 

\^ gression estimate of do at each stage. So, consider the generic adaptive procedure described 

^, 

■^ at the beginning of Section 2.2 where we obtain ^2/ by IR at Step 4 but instead of finding 

O 

^^ a confidence interval in Step 5, we select a neighborhood of ^2,/, say [^2,^/2], and continue 



sampling at Stage 3. Of course, in this case, we allocate our budget in proportions Pi,P2,P3 
adding up to 1. Now, the convergence rate of ^2,/ for do is n^^^"''^y^ and [L2, f/2] is therefore 
chosen to be of the form [^2,7 ± (^2^2''^^]) with 72 < (1 +7i)/3 and 722 = np2- Since 71 < 1/2, 
we have 72 < 1/2. Finally, n^ = nps covariate-response pairs are sampled from [L2, f/2] at 
Stage 3 and the IR procedure is used to come up with a final estimate d^j with convergence 
rate n'^^+"'^y^. But, as (1 + 72)/3 < 1/2, this is again slower than y/n. Following this line 
of argument, it is not difficult to see that no /c-stage procedure based on IR at each stage 
can produce an estimator of do that achieves the parametric rate. Note, also, that we can 

1 



come as close as possible to -y/n if k is chosen large enough. A fc-stage procedure involves 
a sequence (71,72, . . . > 7fc-i) with 1/2 > 71 and 7j+i < (1 + 7j)/3, for i > 1 and yields 
a final rate of convergence given by (1 + 7fc_i)/3. Now, take some large k and consider a 
procedure where (1 + 7i_i)/3 > 7^ > (1 + 7i_i)/3 — 1] for some (very small) rj > 0, for 
i = k,{k — 1), . . . ,2. Then, using the second inequality time and again, by simple algebra: 



^5— >(!-") E (3 j +(3) +35rr, 



which can clearly be made as close to 1/2 as one pleases for small (enough) rj and large 
(enough) k. 

S2. SUPPLEMENTAL MATERIAL FOR PERFORMANCE EVALUATION OF THE 

ADAPTIVE PROCEDURES 
In this section we will present additional results regarding the estimation of m'^do). 

We use a local quadratic regression procedure to estimate m'{do). An asymptotically 
optimal bandwidth that minimizes the asymptotic MSE is employed for this purpose. As 
expected and shown in Figure 1, the estimator tends to perform well with very large sample 
sizes. However, for the sample sizes considered in our numerical work, the performance 
is unsatisfactory especially under the isotonic sine function. The root mean squared error 
can be substantial, which causes the coverage rates reported in the main text to behave 
erratically. 

If we repeat the procedures with perfect knowledge of the nuisance parameter m'{do), then 
coverage rates are about the nominal level of 95% (Figure 2). Therefore, when the underlying 
regression function is well-behaved, we can use the more aggressive PTSIRP. Otherwise, we 
use the conservative but stable PTSIRP-LR, which avoids derivative estimation. 
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Figure 1: This figure shows the root mean squared errors of the estimates of m'{do) using 
the local quadratic regression procedure proposed in Fan and Gijbels (1996). The first five 
data points correspond to sample sizes considered, e.g., sample sizes of 100, 200,..., 500. 
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Figure 2: The left panel shows the coverage rates of the 95% confidence intervals for d^ = 0.5 
from the one and two stage IR-wald procedures with the isotonic sine functions and different 
values of a and n. The right panel shows the average lengths of the 95% confidence intervals 
for do. The derivative m\do) and a are assumed known. 



S3. DIFFERENT BUDGET ALLOCATIONS FOR FUEL EFFICIENCY ANALYSIS 
In this section, we repeat the analysis from the main text with different budget specifications. 
In particular, we present in Figure 3 three different scenarios maintaining the total budget 
of 80 samples. 

Note that the one-stage procedures tend to provide poor point estimates. Whereas, the 
LR based two stage procedure PTSIRP-LR and the local linear approximation (PABLTSP) 
tend to cover the "true" value of 187 in almost all cases with better point estimates. 

In addition to providing reliable estimates with smaller budgets and 'ill-behaved' func- 
tions, PTSIRP-LR reduces computing time by substantial amounts at larger budgets com- 
pared to its one stage counterpart. Figure 4 shows computing times, averaged over 500 trials, 
with 50% of the overall budget allocated to the first stage for every trial. The Wald and 
local linear procedures can be performed faster, though their efficacy is dubious with smaller 
budgets and with 'ill-behaved' functions due to auxiliary parameters estimation. 

Altogether, our numerical studies have shown that likelihood ratio based procedures are 
robust, but do require inversion of the likelihood ratio, which adds computing cost. Utilizing 
two stages reduces computing time over POSIRP-LR, while also obtaining tighter confidence 
regions. 

REFERENCES 
Fan, J., and Gijbels, I. (1996), Local polynomial modelling and its applications, Vol. 66 of 
Monographs on Statistics and Applied Probability, London: Chapman & Hall. 
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Figure 3: Plots for the data analysis with different budget specifications. 
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Figure 4: Average computing times with different budgets for tlie isotonic sine example. 



